Universal Record Statistics of Random Walks and Levy Flights 
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It is shown that statistics of records for time series generated by random walks are independent of the details 
of the jump distribution, as long as the latter is continuous and symmetric. In N steps, the mean of the record 
distribution grows as the */ 47V '/n while the standard deviation grows as yj(2 — 4/n)N, so the distribution 
is non-self-averaging. The mean shortest and longest duration records grow as y/N/n and 0.626508.. .N, 
respectively. The case of a discrete random walker is also studied, and similar asymptotic behavior is found. 
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The study of record statistics is an integral part of di- 
verse fields including meteorology |Q1 01, hydrology [0], 
economics fl], sports 10, S 0] and entertainment industries 
among others. In popular media such as television or newspa- 
pers, one always hears and reads about record breaking events. 
It is no wonder that Guinness Book of Records has been a 
world's best-seller since 1955. In physics, records are rele- 
vant in the theory of domain-wall dynamics 10], for example. 
Consider any discrete time series {xq, x%, X2, ■ ■ ■ , %n} of N 
entries that may represent, e.g., the daily temperatures in a 
city or the stock prices of a company or the budgets of Hol- 
lywood films. A record happens at step i if the i-th entry Xi 
is bigger than all previous entries xq, x\, . . ., Xi-\. Statisical 
questions that naturally arise are: (a) how many records occur 
in time TV? (b) How long does a record survive? (c) what 
is the age of the longest surviving record? etc. Understand- 
ing these aspects of record statistics is particularly important 
in the context of current issues of climatology such as global 
warming. 

The mathematical theory of records has been studied for 
over 50 years 10 OH U , Ht an d the questions posed in the 
previous paragraph are well understood in the case when the 
random variables Xj's are independent and identically dis- 
tributed (iid). Recently, there has been a resurgence of in- 
terest in the record theory due to its multiple applications in 
diverse complex systems such as spin glasses 11311 . adaptive 
processes [14] and evolutionary models of biological popu- 
lations 015l llal . The results in the record theory of iid vari- 
ables have been rather useful in these different contexts. Re- 
cently, Krug has studied the record statistics when the entries 
have non-identical distributions but still retain their indepen- 
dence Jnll . However, in most realistic situations the entries 
of the time series are correlated. Surprisingly, very little is 
known about the statistics of records for a correlated time se- 
ries. In this Letter we take a step towards this goal. 

Of correlated time series {xq, x%,X2, ■ ■ ■ , xn}, perhaps the 
simplest and yet the most common with a variety of appli- 
cations [18], is the one where Xi represents the position of a 
random walker at discrete time i. The walker starts at xq at 
time and at each discrete step evolves via xi — Xi-\ + r/i 
where the noise rji represents the jump length at step i. The 



jump lengths r)i's are iid variables each drawn from a symmet- 
ric distribution (j){rf). This also includes Levy flights where 



-l-ju 



is power-law distributed for large \rj\ with 



exponent < fi < 2 and thus has a divergent second mo- 
ment. Even though the jump lengths are uncorrelated, the en- 
tries Xj's are clearly correlated. This time series correspond- 
ing to a discrete-time Brownian motion appears naturally in 
many different contexts. For example, in the context of queu- 
ing theory 11911 . Xi represents the length of a single queue at 
time i. In the context of the evolution of stock prices Xi rep- 
resents the logarithm of the price of a stock at time i 112011 . In 
this Letter, we compute exactly the statistics of the number 
and the ages of records in this correlated sequence and show 
that the record statistics is universal, i.e., independent of the 
noise distribution (f)(j)) as long as (f)(rj) is symmetric and con- 
tinuous. 

It is useful to summarize our main results. The record statis- 
tics are independent of the starting position xq and hence with- 
out any loss of generality we will set xq — and also count 
the initial entry xq = as the first record. We show that the 
probability P(M, N) of M records in N steps (M < N + 1) 
is simply 



P(M,N) = 
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which is universal for all M and N. The moments are also 
naturally universal and can be computed for all N. In particu- 
lar, for large N, the mean and the variance behave as 
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while the skewness, defined as the third central moment di- 
vided by the variance raised to the 3/2 -power, goes to a con- 
stant value 4(4 - 7r)(27r - 4)~ 3/2 . We also show that the 
age statistics of the records is universal for all N. Evidently, 
the mean age of a typical record grows, for large N, as 
(I) - N/(M) - y/nN/A w 0.8862 y/N. We also compute 
the extreme age statistics, i.e., ages of the records that have 
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respectively the shortest and the longest duration. These ex- 
treme statistics are also universal. While the mean longevity 
of the record with the shortest age grows, for large N, as 
<^min) ~ \/N/Tr ps 0.5642 \/N, that of the longest age grows 
faster, (/ ma x) ~ cN where c is a nontrivial universal constant 
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dy log 
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The universality of 



where T(— 1/2, y) — J y dxx^^'^e' 
these results can be traced back to the Sparre Andersen theo- 
rem on the first-passage property of random walks. 

Let us consider any realization of the random walk se- 
quence {xq = 0, xi,X2, ■ ■ ■ ,%n} (see Fig. 1), where n = 
Xi-i + Vi an d Vi' s sire iid variables each drawn from the dis- 
tribution <f>{rf). Let M be the number of records in this real- 
ization. Let I = ?2) • • • ; Im} denote the time intervals be- 
tween successive records. Thus U is the age of the i-th record, 
i.e., it denotes the time up to which the z-th record survives. 
Note that the last record, i.e., the A/-th record, still stays a 
record at the 7V-th step since there are no more record break- 
ing events after it. Our aim is to first calculate the joint prob- 
ability distribution P (l, M\N*J of the ages I and the number 
M of records, given the length N of the sequence. For this, 
we need two quantities as inputs. First, let q(l) denote the 
probability that a walk, starting initially at x, stays above (or 
below) its starting position x up to step I. Clearly q(l) does not 
depend on the starting position x. A nontrivial theorem due to 
Sparre Andersen [21] states that q(l) = ( 2 ')2~ 2 ' is universal 
for all I, i.e., independent of (f)(jj) as long as <p(r]) is symmetric 
and continuous. Its generating function is simply 
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Our second input is the first-passage probability f(l) that the 
walker crosses its starting point x for the first time between 
steps (i — 1) and i. Evidently, f(l) = q(l — 1) — q(l) with 
I > 1 is also universal and its generating function is 



f(z) = = i - (1 - = i - 



(5) 
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Armed with these two ingredients q(l) and f(l), one can 
then write down explicitly the joint distribution of the ages I 
and the number M of records 



P 



(l, M\N) = f(h) f(h) ■ ■ • f(l M -l) q(hl) *j 



(6) 

where we have used the Markov property of random walks 
which dictates that the successive intervals are statistically in- 
dependent, subject to the global sum rule that the total inter- 
val length is N (see Fig. 1). Note that since the M-th record 
is the last one (i.e., no more records have happened after it), 
the interval to its right has distribution q(l) rather than /(/). 
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FIG. 1: A realization of the random- walk sequence {zo = 
0, xi, X2, . . . ,xm} of N steps with M records. Records are shown 
as black dots. {h,h, ■ ■ ■ ,1m} denotes the time intervals between 
successive records. 



One can check that P \ l, M\N j is normalized to unity when 
summed over I and M. Since q(l) and /(£) are universal due 
to the Sparre Andersen theorem, it follows that P (l, M\N^j 
and any of its marginals are also universal. 

Let us first compute the probability of the number of 
records M, P(M\N) = J2r p (f,M\N). To perform this 
sum, it is easier to consider its generating function. Multiply- 
ing Eq. (O by z N and summing over I, one gets 



£ P(M\N)z N = [f(z)] M -'q(z) 



N=M-1 
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By expanding in powers of z and computing the coefficient 
of z , we get our first result in Eq. ([T). One can also easily 
derive the moments of M from Eq. (O. For example, for the 
first three moments we get 
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The large- N behavior in Eq. (fJJ can then be easily derived 
from Eq. dHJ by using Stirling's approximation. In Fig. 2, 
we demonstrate this universality by computing from simula- 
tions (M) for three different distributions <f>(r)) (i) uniform in 
[—1/2, 1/2] (ii) Gaussian with zero mean and unit variance 
and (iii) Cauchy or Lorentzian: <f>{rf) = 7r _1 /(l + rj 2 ), which 
is an example of a Levy flight. We then compare the data with 
the exact formula in Eq. ([SJ. The agreement is excellent and 
one cannot distinguish between the four curves for any value 
of N. 

It is also interesting to compare this statistics of M for the 
random-walk sequence with that of the iid sequence where 



3 




1000 



FIG. 2: (color online). The top curve actually contains four different 
curves denoting (M) vs N for (i) uniform (ii) Gaussian (iii) Cauchy 
distributions for cj>(rf) and also (iv) the exact result in Eq. (8). The 
four curves are indistinguishable. The bottom curve shows (M) vs 
JV for the lattice random walk with ±1 steps, i.e., when <f>(ri) = 
[<5??,i + 5 v ,-i]/2, and agrees with the Eq. dl3t . 



each entry Xi is a random variable drawn from some distri- 
bution p(x). In the latter case, it is well known II 1 CJfl that 
the distribution of the number of records P(M\N) does not 
depend on p(x), and for large JV, it approaches a Gaus- 
sian, P(M\N) ~ exp[-(A/ - logJV) 2 /21ogJV], with mean 
(M) = logJV and the standard deviation a = ^\og JV. 
Thus, fluctuations of M are small compared to the mean for 
large JV. In contrast, for the random-walk sequence, it fol- 
lows from Eq. (f2| that both the mean and the standard devi- 
ation grow as y/N for large JV and thus the fluctuations are 
large and comparable to the mean. This suggests that in the 
random-walk case P(M\N) has a scaling form for large M 
and JV, P(M\N) ~ iV" 1 / 2 ^(MJV- 1 / 2 ). One can indeed 
prove this by analysing Eq. (0 in the scaling limit and finds 

g{x) = e -* 2 /4/^. ' 

While the typical age of a record grows as (/) ~ N / (M) ~ 
JV 1 / 2 for large JV, there are rare records whose ages follow 
different statistics. For example, what is age distribution of 
the longest lasting and the shortest lasting records? These ex- 
treme statistics of ages can also be derived from the joint dis- 
tribution in Eq. (0 and hence they are independent of 4>(rj). 

We first consider the longest lasting record with age J max = 
max(Ji, J2, ■ ■ ■ ,1m)- It is easier to compute its cumulative dis- 
tribution F(n\N), i.e., the probability that J max < n given JV. 
Now, if J max < n, it follows that U < n for i = 1,2, ... , M. 
Thus, we need to sum up Eq. (O over all J^'s and M such 
that U < n for each i. As usual it is easier to carry out this 



summation by considering the generating function and we get 
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F{n\N)z 
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Extracting the distribution F(n\N) from this general expres- 
sion is somewhat cumbersome and we do not present the de- 
tails here 12511 . However, one can extract the asymptotic large- 
JV behavior of the average (/ ma x) = E^Li [1 — F(n\N)] from 
Eq. © using the explicit form of q(l) and f(l). Skipping 
details [25], we find that for large JV, the mean age of the 
longest lasting record grows linearly with JV, (Z max ) ~ cJV 
where c = 0.626508 ... is a universal constant given in Eq. 
©. Thus, the age of the longest record (~ N) is much larger 
than the typical age (~ V~N) for large N. Interesingly, exactly 
the same constant c has appeared before in a different context 

BR 

The statistics of the longest record for iid variables follows 
a similar asymptotic behavior (Z max ) ~ c\ N but with the pref- 
actor l25tl 
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= 0.624330 . 



(10) 

which also describes the asymptotic linear growth of the 
longest cycle of a random permutation and is known as the 
Golomb-Dickman or Goncharov's constant (see [22]). This 
result for iid variables also emerged recently in the context 
of a growing network model [26]. Interestingly, the con- 
stant c = 0.626508.. for random walks is quite close to the 
Golomb-Dickman constant. It turns out that although the two 
problems (iid variables and random walks) have some com- 
mon features (at least qualitatively), the origin of universality 
is quite different in the two problems [25]. 

For the record of the shortest duration ^ m j n = 
min(ii, I2, ■■■Im), one find that the generating function of the 
cumulative distribution G(n\N) denoting the probability that 
^min > n is given by 
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(11) 



One can then extract, in a similar way, the asymptotic large- 
N behavior of (/,„;„) ~ y/N/ir 112511 . Thus, the mean age 
of the shortest lasting record grows in a similar way as that 
of a typical record, albeit with a smaller prefactor I/^/tt — 
0.5642 . . . compared with y/ir/4: = 0.8862 . . ., respectively. 

We have verified the results for (l m in) and (/ max ) nu- 
merically for the case of jump distribution <f>(r]) uniform in 
[—1/2,1/2], simulating 10 9 samples containing 10 4 steps 
each. We kept track of the largest and smallest interval be- 
tween records (including the final incomplete time interval) 
for each value of JV", and calculated the average over all 
the runs. The results are shown in Fig. [3] where we plot 
(^minVVJV and (i max )/JV, in the first case vs. l/VJV, and 
in the second case vs. 1/JV; making plots this way, we find 
that the data falls on a nearly straight line as JV — > 00 in each 
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FIG. 3: (color online). Plot of simulation results for {Iminj/yN 
vs. 1/ y~N (blue data falling on the steeper curve) and (/ max ) /N vs. 
1/N (red data falling on the less-steep curve), showing the asymp- 
totic behavior of these two quantities. Linear fits to the data for 
500 < N < 10000 yield the straight lines, whose equations are 
displayed. 



case. The intercepts, 0.56480 and 0.62652, agree closely with 
the predictions, y/TJir = 0.564190 ... and 0.626508, respec- 
tively. 

We also considered the discrete (non-continuous) case 
where the walk jumps by 77 = ±1 at each time step. For this 
case we find 



N=0 



2(l-z) 3 /2 
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which implies 



< M >=2 
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where 2-F1 is the hypergeometric function, implying (M) = 
1,3/2,7/4 2,35/ 16, for N = 0,1,2,3,4. For large N, 
(M) ~ y / 2N/n, which is l/\/2 of the expression for the 
mean in the continuous case. We also find (l maK ) ~ cN, and 
(/min) ~ y/2NpK, which are respectively equal to, and \/2 
times, the corresponding expressions for the continuous case. 
These results were also verified in a simulation. 

In conclusion, we have shown that the record statistics of 
a time series generated by a Markov process (random walk) 
are independent of the details of the walk distribution when 
that distribution is continuous and symmetric. Walks with a 
discrete jump distribution show similar asymptotic behavior 
but in general with different coefficients. The results should 
be useful in analyzing a broad class of physical phenomena 



and are relevant for example to analyzing questions of cli- 
mate change. A possible future problem is the calculation 
of record statistics for non-symmetric random jumps (with a 
drift) - such as would be the case for a global warming trend. 
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